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ABSTRACT 

The author t^kes the position that standardized 
tests^ as presently developed and marketed, do have potentially 
.positive usee* However, these advantages .are outweighed by the tests' 
deleterious effects on children and programs. Standardized tests 
refer to published^ norm j:ef erenced , achievement and intelligence 
tests which contain specifTc^^i^jTStructions for administration. This 
discussion irciudes a historical^backgrouBd of standardized testing, 
an explanation of the tests themselves, a proposed moratorium on ^ 
testing and suggested alternatives to standardized^ testing. A i 
bibliography is appended. (MV) • ] 
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^ A Prefatory Comment 

! X needs to be said at the outset that standardized testing is at the 
center of an enormous educationai controversy. It has become an 
issue as a result;, in part, of accountability pressures: but another' 
strong factor is the growing number of teachers who want increased 
control over „ curriculum, materials, and classroom practices. 
Another significant contributor to the conflict is the tension growing 
out of the increasing trend toward decentralization in school 
districts. Standardized tests, for example, were thought to make 
some sense when most students in a given school distri^ct were 
involved in a common curriculum. But as school districts have begun 
to. decentralize, to foster alternative methods of education, common 
curricular patterns are being threatened, in addition, we are in an era 
when "equal educational opportunity" is being affirmed ^r^ never 
before. That affirmation has brought with it increased understand- 
ing of the ways in which children of the poor, who include a large 
percentage of America's minorities, racial and ethnic, have been 
deprived of equal educational opportunities. The role that 
standardized tests have played in this process fills the literature and 
helps fuel the debate. 

The foregoing background is not meant to l)e all-inclusive. Many 
of us could provide other significant reasons why standardized test- 
ing is in the ''eye of the storm. ' !f my orientation were different, I 
could;:!. suspect rsuggest— though i do^not bl^Iic^'eTHe' argument has 
much substance—that educators have made standardized testing an 
issue to ''cover up" their failures, to rationalize the "decline in test 
scores." But that is not my orientatior}! And that is a point I wish to 
make very ciear to readers of this faslback. 

i do not believe that standardized tests, as presently developed 
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and marketed, have a great deal lo conxrihoie to childre!!. teachf/rs. 
parents, or scfiot)|v. VViiatever nv.^rii rhev rt^.iy have, and I boiieve 
I hey do have s(.?rne pcilentiaiiy positivt* uses. iS on J weighed 
significantly by their ne^.iiive ciujlities. bv dtoir cieleiertous efff^' ts 
on children cind programs. 

Now {hat I have estid.^lished iv.\ p^?^'^■>n3{ orientaiir^n —one 
vvi!I pervacle this faslback— I need to t/r-'.)vid.' so;ne additional 
contextual intornidtiofi. Most of nn t'x<iirt;jiv's v.u\ Ijc r^/iated lo the 
elementary grades, principally bot ause thi-, is here the majouiy c-f 
standardized tests jre usetL When I speak ot staridardi^ed tests. 1 am 
referring to tests which are pubiisheci. norm-referenceci. and 
administered according to explicit instructions. Popular examples of 
standardized achievement tests are the MefAo/)o///<Mi .nrhfeve/?)enf 
Te^t, the St^uvord Achu'wrinuii /<-st, the S<*an/oft/ Toi tU Ai aJcmic 
Ski!!-., and the low w Ti^st o/ H.i->/r Skilh. to narne only a lew. f xampies 
ot standardized measures of mi^tual .abilitv are the Oii^-icnnDii 
.\/e/)fa/ Ability Tc^t, the StjMrord-fi/r}ef , anci the.VVechs/er /rUe/.'/- 
genre Scj/e tor Cht!(ln-i). My f)or[")ose in lisiing ihe foregoin^^ is to 
assure as litde misuruierstandmg as possil)!** at)0'at what is meant 
when 1 reter to standarcii/ed ttst* anti stafniardi/od teslini;. 



A Basic Position on Standardiied Testing 

Siandardi/ed te^tinj;, essentiaMv a posi-VVorUI VVnr I plienoiv.enon. ' 
has become comnionpLae in America. Tes?s exist for aInK)St every 
social human trait imat;inable. inciudin- intelli<;once. aUenatton. 
selt'conccpt. m.aturiry. moral development, an(i creativity. They arc 
used to select people for admission into and for exclusion from a 
wide range of educational programs, private and puhitc projects, 
and jobs. Standardized tests affect Americans of all ages, in all fields; 
however, thev come down most heavily on the young, those 
between the ages of 3 and 21. David McCioliand suggests that 
standardized tests have been soihoroughly ingrained into .merican 
schcxHs that "it is a sign of backwardness not to have test scores in the 
_^d[U>,jf records of children." While i am concerned about the eftects 
of testing on individuals of all ages, 1 am especially troubled about Its 
effects on young ehiidren^at the primary school level These are years 

when children's growth is most uneven. Not only are there great 
differences among individuals, but also within the individual over 
time. During this period a large number of the skills needed tor 
success jn school are in rather fluid acquisiiional stages. 

nhe most widely used standardized tests attempt to asf,ess 
intelligence, language skdls, reading readine'.s. achievement in 
various subjects, aWd, in recent ye.rs, one's self-concept. As I have 
said the tests are commonly norm-referenced, so that users can rank 
order individuals or groups in relation to a particular, or norm, 
population. (Most test publishers claim that the populations used to 
establish norms are representative of the general population for 
whom the tests have been devised. This is. however, a questionable 
claim m many cases.) We fiave become so accustomed to their use 
that we often fail to ask ourselves whether the tests do in fact assess 
what they purport to assess, or whether the assumptions that 
undergird the claims of the test makers are acceptable- 

^ ' ■ .8 ■ 



Some Hard Questions About Standardised Tests 

Sheldon VVhik* iugt;osts thai ussng blarui^)rdi/ed tests we ore 
involved with *'an affair in whsch n^:^gic, science, and myth are 
intermixed." He may well be offerir,g an undtHstaiemenf. How many 
oi us. for exampie. actually believe that the intelligence and 
competence of any individual can be adequately represented by his* 
score on any group-administered test? Or, that there is one'normal 
curve" that can provide a distribudcjn capable of classifying all 
children? Such assumptions defy almost everything that we have 
come to understand about chikiren's growth. 

Even if one fails to take note of the implicit assumptions of the 
tests, an examination of tlie test items ought to cause enormous 
concern.** Are they clear? Are they fai^? Do they address the 
piuticuL'ir educational concerns of teachers of young children? Do 
the tests provide t/se/'u/ information about individual children, about 
a class as a wliole? Do they lielp young children in their learning? Do 
they support children's intentions as learners? Do they provide 
parents with essential information about their children? ; have 
.encountered few teachers able to provide an affirmative response to 
aoy of the foregoir^g questions- They do respond in the affirmative, 
however, to the following questions. Do teachers feel any pressure 

*For cijfity and econonn, we use the rni^culine forr>i of prorioun-s thtoughout this 
pu Wit virion wht'fi rK>sp*\ itir g(-ndor is implied . While we rec<)i;<u/(Mh{:: trend avvvjy from 
rhk pf iictico, we see no ^'/jcrful ahernalive. We tiope the reader will impute no sexifit 
motives-, rortviinly none^.ife intencf^'d. — T/ie tdifx)rs 

••A large number of pubhcatior^s hjve provided thou^htfu! critique^, of 'ample (e^>t 
Items from a variety of f5opal.ji-ly used standardized ;e>ts. S('e: Dehor J h Meicf. Rpiidin^ 
fjilurf ancithv To\(s (New Vctk: Workshop Center tor Opt-n tcucuion. 1973); Dehordh 
Meicf , t^erb Mar k, and Ann Cook, «edr/int,' /esf>; Do JhiyHart Voor C/ii/dr-' (New Yfjrk: 
Community Resource's !nstilutt;\ 1y7i); Banosh Hoffman, 7yMr>riy o/ Tesf/n^' (New 
York: Collier Books, 1964): S^.nJona! fJement^iry Principal} (March/Aprii. 1975, and 
August, 19/5). 



to teach \o \\\v. losts? If th*? tt^N, worr- not given. wouUi ibcr^ be 
skill sheets .ukI v\orkbuoks, a bfi.'.i<i^'f rjn^c o! fiijUMuis. more 
attention to wUeizf ju.'.'l le-uiiiiM:^ Um( Hffs prelef tu u^e the 

time devoted to vtancKirdj/ed le-^ltrv^ \or oiher educ utiue^.il aetivittrs^ 
Do teachers feel ;hat they ^'an J^se^^ c [■diif^.'n's U-arneiv^ ir'i more 
appropriate wav^ thjii ihruiiU^'f fht' useo' -5ar;i'arcii/eH dchlevenH-nf 

lafaeb That Crippie 

! I.av'e JTianv other 0->n( e.s ^li^out ^^.ino^5^0l/^'^l tesis. Ibvy hav<' 
beeiT used inerea^inglv to fT^ake judi;nv.Mif- about i. hiluren, Cfiildren 
judged to be ''below average" ate riot iikelv tu» receive, in ^lo^t 
schools, the kinds of etiiu ationai 0[-lpt=rujI^'Hie^ jvailable to childrers 
jucii;ed "above average." PLuer.ient m reoiediai and other speeial 
education pr our arris arid ie itAver-level na( e-. is osoaliy' rela!ed 
eloselv lu test results. Cfiilti'eri pLu ed in sufh ^eitie.^^ are oiter» 
viewed toilufes: e\f)eetalions reod nou tn be t^^^h !or du-rn. Aod 
c hikiren in su( 1) setlinv;^ c^uickb ie,^a fi to view then^.seK es as tailures, 
f)!odueinv; Hrtle Cfiildren who are labeled in. a manner ihar sugge^t^ 
lirniied ab^nitv tind tiut th(Mr iviueati(.?n takes r)n a ruirrow toc us, one- 
ditnensifMial lask^ suc h a^ skill sheets. \vorkl;ooks. and drills o! one 
kind f->r another bein^^ most prorvrinent 

VVho are iherhihiren who k\nji tTiosT. ()ften lo bedabiMecl '- b^eltnv 
"average" r fhe high prop(;rtii)t^ of children fronn lower soeio- 
eco r't o m i c p op li bi t i o n s , which i n chi d e I a r g e n u m b e r s o t m i n o r i { i es , 
represented in special e(!ucation and lower-levebtracks out^ht to 
give Us seri(.iu^' pause, jane Mercer provide^:, rather stark (iata: 
namely, that from 50 to 300".. more black and Mexican .Americans are 
sdcntified as rneotally retarded than couUi be reasonably expected 
from their prop(jrtion of the potyulation. Our convmiiments to 
democratic practice and eciuality of educational of:iport unity shcujld 
force us to speak out suonglv again'^i a process th.st consistently 
produces such rcsuhs. 

Multiple Patterns, Diversified Programs 

Teaclier^ o? young children have long b,eNeved that children 
come <o learning in many different ways, demonstrating in the 
process rhai th'?v have multiple patterns of growth and achieve- 
ment. This belief has given direction to programs which are diver- 

• I . 10 , . 



sified as to aims and gojk. In those prut^rams. clulciron are respecied. 
regard iess of racial back^rounti or socioeconomic clast?. Their 
interests hj\Q \h'^<u\)o baSK ^tjriifig {K>int^ ioi learnuig. Sucli 
developfnental f)r(,>i;rafi)S ha\e teruicd U) support n)ore tortnal 
-instruction \n reading; for evam[:)li>, onh when thikfren are !eadv 
anci not i)ecau?e ?hov ar o b \ imtn oUi. 

Bc'c^Uise teacher^ in ->uch -vrtm^'^ havr bf^n coninii*a«.M.i to 
increasing children's opportunities t(u suvcls^tul experience and 
high levels of seif-esieem. manv !earJue.v;i>ptt<.>nsare niade available. 
Tiie clock tfien tends not to determint' to su( h a Luue degree when 
children bei^in and end learning activities. Peer interaction — i.e.. 
cotTimunicaiion — is encourageti. irnet^rid. rather than peripheral, to 
a child's lite in the classrtn>fn are the creative and expressive terms ot 
communication that have the capacity tor d(*\eir)ping feelirig — the 
most f:)ersonal of hutnan possessions. ( lofj often, a teac her cioes little 
with the creative and expressive arts because tfiey tlon'r relate* 
particularly well to the normative tesiiruj; programs. Thev are not 
has/c enough!) Static expoctaticnis for < hildren, rooted iri an array of 
basal materials and commori curricula, do not reflect the ciiversity 
that actually exists afid is supported in rt'Sf^onsive primary scfiools. 
Yet the staridardi/tx^^^ in slanciard curriculun^ 

materials (basal t{?>^tb()C3ks, svllabi. arid state i;uideiines) that have 
prede-ter mined e\f)ectati(jfis louard which every one is exfoected to 
work. To actually dev^e'op a responsivi*, developmental, classroom 
environment is to risk lower scores on many of the stancjardi/_ed 
tests. Teachers ant! chilcfren (jf) not rieed these kinds of external 
pre-,suVes, 

Evaluation Consonant with Purpose 

Does the fore^(jif\i^ sugi»(^st that evaluation is not irnporiant? 
Most definitely. I <jo not oppose evaluation; I consider it basic to the 
growth of programs, teachers, and children. But evaluation needs to 
be embedded in tlie ciassrc^ofns. It needs to be consonant with 
purpose. Assessing children's .^rovvth, for examfjie. is an intense 
activity, ancJ it should occur daily. continu(Mjsly. It is integral to 
everything that goes on ;n a classrooni. 

1 i 
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Hislorical Roots 

H av!n^< now o^tabH^hed a bjsic. pl?^iuo^, 1 wil! pio^. <:m.^<1 lu a discus- 
sion some ot the hjstorit .ji routs of ^tondardi^^'d testinu in. this 
country. VVhile 1 wouid certoinly at knoAied^e thjt one run 
un(ierstand standardii^ed tt.>^tinv^ prjciice and psyc [u^inctrics with- 
out any special knowleiJge ot tfie historical <JeveioprnL'nt of stan- 
dardized testing. I beiieve that the history is in^portafu if orie wishes 
to examine the implicatioi^s of standjrcii/eci te'^ting and g.iin in- 
creased understanding of its assu!nf)tions. 

Historical perspective is especiaily important to an uiuJerstanding 
of the cu Trent ( rititiue, hi suggestit^g a historical porsf^ective. I do not 
wish to imply thjt the context today is the same as that which existed 
in the ' early vears of. siandarciized testing. The diftorences are 
profound, as will become dear in the narrative wfiich follows. 

The beginnings of standardized testing were taking form at the 
turn of the last century, !t was a time of rapid change in many ospects 
of American life, fmmigration was reaching new heights, especially 
with 'a heavy influx of southern and eastern Europeans who were 
considered less assimilable than earlier immigrants: industrializa- 
tion, aidcci by a growing faith in science and t<}chnology, was firmly 
rooted; what seenuni an uncontroHable urban expansion paralleled 
the increased levels of immigration and indListrialization; and 
schools were under intense f>ressuies to enroll larger percentages of 
the sc:hooi-age population, especially at the secondary school level. 

Psychologv and education, as areas of academic inquiry, had long 
been stepchildren to the more traditional academic fields such as 
philosophy and historv. Seeking status, they turned increasingly to 
science and technology as a base for their inquiry. That practitioners 
within these related fields adopted statistical procedures and 
scientific methods is not surprising. The standardized tests which 
rhoy developed met many of the conditions of science as they were 
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understood at that time; in addition, and possibly marejmpoHantly,.; 
they represented an activity which could support many of the 
cultural assunDptfons of the day-. 

In a societyi'^linging to e^aliuuian views, how could the great dif- 
ferences am£pg>aeople with regard to education, stati's/and power 
be explained? The soei^al Darwinists suggestedjhat such differences/ 
existed because intellectual attributes iniong /individuals (arM * 
groups) varied significantly. The tests quickly bore this put. Indivi- 
duals (and groups) could be classified on jhe basrs of "scientific^ 
measures/* selection processes could be established "without any 
blemish on democratic philosophy." While ! would hesitate to sug- 
gest that social Darwinism was the predominant philosophical : 
orientation at the beginning of the century, it cieady had wide- 
spread support from the middle ind upper classes (wM were at that ; 
time principally white, Anglor^axon, and Protestann /nd was called 
upon to buttress the development of standardized/tests. 

.^Alfred Binet, an early 'champion of experimenVa! psychology, is 
generally acknowledged as being responsible' for legitimating 
standardized tests. In'^the years that followed the passage o( compul- 
sory education ijegislaiion in France (1881), Binpt raised questions 
about the degree to which all children could benefit from regular 
school activities. By the turn of the century, Binet advocated speclai 
classes for those possessing 'limited ability/' ^ 

Along with Theodore Simon, Binet was invited in 1904 by*the ^' 
Ministry of Education to develop an identification process whfch 
might be used to select children for special classes. Simon and Binet 
capitalized uporivthe opportunity to develop a series of tests. The 
proucedures they employed for norming and validation were not very 
dissimilar from those used today. (Another similarity exists in the 
results. Then, as now> scores reflected the social/economic 
structures,) While Binet was not immune to the use of such phrases as 
menfa/ ab///fy in relation to his test results, he had reservations about 
interprellT^glmVidividuars test results narrowly: he did not believe, 
for exaWiple,. as/many of his follower^ did, that the tests measured 
fixed-^rmellecmal qualities that were not amenable to .further 
training. ' ■ 

After Binei's death, Lewis Termin of Stanford University began 
revision of the Binet testing process. The Slanford-Binet Test, 
published in 1916; was the resulL Termin, in order to bring more 
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specificity to the results, attached a score which he called an 
7me///§ence QuotierK. In The Med^uremei)l of Inieiligence, Termin 
described the "potential" for the lest In clear fashion: 

. ... [l|nteHigence tests will bring tens of thousands of those high-grade 
-defectives under the surveillance and protec lion of society. This will 
• : ultimately result in curtailing the reproduction of f^eble-mindedness 

and in the elirnination of an enormous amount of cririie. pai.iperisni, 

and iiVdusiriai inefficiency. 

■ Those with IQ scores in the 70 to 80 range were of particular 
concern to Termin. He wr.ote' further in The M,i\isurcmeni of 
Intelligepce: 

[Such intellectual deficiencies are] very common among Spanish- 
Indian and Mexican families in the Southvvest and also among 
Negroes. Their dullness seems to bo racial; or at least inherent in the • 
family stocks from' which they come. . . . Children of this group should 
be Segregated in special classes. . ... They cannot master abstraction, 
but, they cafi ofte;n be made into efficient workers . . . from a eugenic 
point of vievv they coTsstitule a grave problem because of their 
unusually prolific breeciing. 

(Such arguments, though less direct, are not uncommon in the 
literature produceci today by Arthur Jensen ; Richard Herrnstein, and 
^William Shockley/to name a few.) 

Termiri, along with others such as Henry Goddard {Vlneland 
Training School in New Jersey) and Robert Yerkes (Harvard) who had 
' similar interests in *'te,Ms of intelligence" and later "tests of achieve- 
ment/' eventually , saw southern and eastern Europeans as also 
demonstrating mental deficiencies. It is'nct surprising that all three 
were^aetive in the eugenics movement — the tests. were for them 
exctllent detection devices — anci in political efforts to stem the flow 
of the "inferior" southern and eastern Europeans who were entering 
the United Stales as immigrants., 

. Goddard went to Ellis Island in 191 2 to administer the Binet lest, as 
vvell as others which he devised, to new immigrants. The results were 
hardly surprising, though to read Goddard's account in^Leon 
Kamin's The Science and*Po}itic<i of /.Q, it was shocking: He judged 
to^^of lews, 80% of Hungarians. 79% of Italians, and 87% of Russians 
as "feeble-minded," In 1912 it was Goddard who also provided the 
classic tracing of the Martin Kallik*IiT< family, a case history still used in 
psychology texts as late as 19^5. 

When the United States entered World War L Yerkes, then 
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president of the American Psychological Association, proposed on 
behalf of several of his colleagues (including Termin and Goddard) 
that psychologists could perform a service by administering tests to 
ciraftees as an aid in their military placement. Tests were given to 
l)^4}&t000 nrten. While they were not used by the Army for purposes of 
placement, they did serve to solidify the legitimacy of standardized 
testing and "improve" the technology of psychometrics. 

The data from the testing were reported in 1921 by the National 
Academy of Sciences as Psychological Examining in the U.S, At my 
(edited by R. M. Yerkes). The data were analyzed farther in A Study ; 
of American Intelligence by C. C. Brigham (Princeton, University 
Press, 192 ji). There were no Mirprises. Whites.scored considerably 
higher than blacks; ir-^ Ms from Scandinavian and English- 
speaking countries scv \ificanily higher than those from Latin 
and Slavic countries. ..>wing correlations between the test 
scores and the length pt time an individual bein'g tested had lived in 
the United States were essentially dismissed. 

The results of all of this testing were influential in the passage of 
the 1924 Immigration Act, which placed discriminatory restrictions 
on the immigration of non-Anglo-Saxon populations. It was a major 
victory for psychologists such as Termin, Goddard, Yerkes, and 
Brigham. In fairness, however, it should also be said that the victory 
was not theirs exclusively. There were millions of Americans who 
were convinced that immigration should be restricted and did not 
need the psychological test scores for confirmation. (Provisions of 
the 1924 act relating to national origin quotas were maintained in the 
McCarran Act of 1952. By the 1950s, however, pressures were 
building to bring an end to all discriminatory legislation. In 1965 
Congress passed the Immigration and Nationality Act, which 
eliminated the quota system based on national origins.) 

Walter Lippmann was one of the few individuals of the time who 
raised a voice of protest: his outlet for criticism from 1922 to 1924 was 
the New Repuhfic, In an early commentary, he wrote, /'The real 
promi<,.e and value of the investigation which Binet started is in 
dangei^"of gross perversion by muddleheaded and prejudiced men." 

In the issue of November 15, 1922, Lippmann wrote: 

:. . . [Ijnteliigence is not an abstraction like length and weight; it is an 
c}xccedingly complicated notion which nobody has yet succeodecl in , 
defining.. .. If the impression rakt^5 root that these tests really measure 



inieHit',ence, that they contribuio o sort of List judj-fVienl on the chiUi's 
(Mpjcity, that ihoy tevtMl ■'s( ioniific.iliy" his predetermineii ability, 
then it would bf iliousond times bettor if »)!! the intelligence test(>rs 
and tl^eir questionnaires weie sufik without vv.uning in the Sargasso 
Sea. 

Lippmann closed h,is original series of six articles or) the intelli- 
gence tests by suggesting that psychologists back away from their 
'.'pretentious" directions and "save themselves from the humiliation 
of having furnished doped evidence to the exponents of the new 
snobbery/' 

Termln's responses to Lippmann deserve m'ontjon. They are 
similar in content to what is produced by many contemporary 
apologists: namely, appeals to scientific authority/, ridicule, and 
nonsequiturs. In one resonse (November 29. 1922), Termin 
suggested, along '.with considerable ridicule, that Lippmann had- 
"some kind of emotional complex/' Lippmnnn's response in the 
january 3, 1923 N'evv /^epu6//c: 

Well, 1 h.ive |an emotional complex about this business]. I admit it. 1 
hate the impud(»nce of a claiiii tliat in ~>0 minutes yau can judge and 
classify a human being's predestineci fitness" in life. I hate the 
pretentiousness of that claim. I hate the abuse of scientific method 
which it involves. I hate the sense of superiority which it creates and 
the sense of inferiority which it imposes. 

iJ- -I* 

But Lippmann's charges, though powerful in tone, were not 
particularly influential. Tests, those producing IQ scores and those 
producing. achievement scores, proliferated rapidly in the 1920s and 
1930$. They fit many school needs of the day by pfbviding external 
procedures to justify promotions in the schools— now more 
committed to age-grade patterns than ever before, (That they serveid 
to justify the continued pre-eminence'of the privileged in American' 
society seemed not to be a problem for the majority of educatorsv)^ 
And, 3S was noted earlier, they fit the scientific ethos of the period.* 
Even the progressives, especially^ ih the twenties, gave passive 



•Robert Ihorndiko .inii lili/abt^h Ma^an uriio; 'The tt'StInK niovenuMU st;orniMl (".po- 
ciallv suited to th^* ioijh>^''" t)f this country and took hold here with a vigor atui enthu- 
sia^irn unoquJl^'d elsewhero," ^)^\^^nronxvn^ Snd 5vj/uaMoN »r) P^y( h(ilo^y tduci- 
(lut) {Now York: )ohn VVilc»y. p. 5. R<\uiers %houl(i undersi.nul that litjndardi/ocl 

teslini; has htKoinr in larg?' measure.an AnicrK an phcnomtMion. It has never Had very 
much ton<.isTent support in the rest ot the worhJ, 
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apprqya) to the testing activities. To attack "science'^ was not cqn- 
sistcVitwith their basic approaches to education. In l^Hah|r-ttes^^^^^^^^ 
wer<^:^>ame shifts in progressive thought. (For exampltvv;^Kempts 
, werl^^macte ..^p have stanclardi/cd testing examine(-lv:\vjthin the 
Progfe'^^^^^ /ysorjdjion. But the organizatid?)' was in thc^v' 

V beginning-'t^f it> intellecVual-^^ by this time.) ^ / ' 
. i reci)gnize*jhat--l- hcive: fecus^^^ on several individuals whose ■ 
'sqclopolnrc'aUi^ ^^'^^^^ decisively and. by today's 

:b5f^tandAr:jds/n^gativiriy. Jh\s mjght iead some readers to suggest that 
'hTy Treatrhen.l. aj;1,easivto this/pdint, has not been balanced. 1 could 
have highlightedTas nnost texts on measurement and U-iStlng do, the 
pioneering efforts of Termin andjerkes/the contribL'tion of Termin^^ 
to our understanding of the gifted (he did make a n^mberof 
significant contributicin^h and the efforts made by.-coriscientious 
nieasurement scholarS^uch as E. L. Thorncilke-talricl 1 will comment 
further on ThorndikcVtc) improvj>ttS'tmg practice. I could have 
alluded (as several of thejexfThave) to the arguments about the 
influence of heredityon intelligence and achievement test scores 
and to \\yQ-^<(mceKn<y about the tests really testing social class /I. 
LmtglTt not have mentioned eugenics at all (as most texts 
The point is that the contemporary literature produced by 
measurement scholars does not provide any balance. My purpose 
has been to raise aspects of the history of testing in America that have 
not been fully in view; in other worcis, to bring some balance to th^^„^ 
discussion. - ;^ 

Were there other forces besides eugenics and t he scientific ethos . 
that helped make testing such a popular enterprise? There was a be- 
lief often expressed in the early literature that the tests represented a 
democratic, .objective process for selecting students for admission 
into colleges, and into particular professions, for passage to new 
grade levels, apd for receipt of academic honors. Some egalitarians 
argued that a: test ^core removed the possibility/of social status or 
faulty, prejudicpd, teacher/school judgment /determining . oheV 
entry, for example, into the elite schools. (The elite schools received 
the same students after s.landard-ized testing as before; but the 
illusion of democratic practice survived the/twennes and persisted 
■ well into lhe-n950$.and early 1960$0^^How 'many times have we heard 
about the chilcl svhb came from a lo^wer socioeconomicand minority 
•background who was singlej^cf'qyt' because of high test scores and 



given academic opporiunilies that might not otherwise have been 
available? There have been many such cases, but when we consider 
all of the individuals of lower socioeconomic and minority 
backgroimds who owe increased levels of academic opportunity and 
altered social or economic status to lest scores, tfie numbers were 
surely small in the decades between 1920 and 1950 and there is little 
to suggest that conditions have changed. 

Ralph Tyler, ir^ his critique on testing, comments that standard- 
ized testing began "as a means for selecting and sorting people.and 
the principles and practices of testing that have been worked out 
since 1918 are largely the refining of means to serve these functions 
rather than other educational purposes." Sorting and select ing/fyler 
would suggest, were viewed in the early period as natural and neces- 
sary functions of the schools; hence, why would tests designed for 
such purposes be questioned seriously? 

A stable force in the early testing movement was E. L.' Thorndike 
: who, in the long run, may well have had a greater influencethan the 
eugenicists. His Introduction to the Theory of Mental and Social 
Measurements, published in 1904, was an important contribution to 
the measurement field. The Thorndike Hiindwriting Scale, produced 
in 1909. was the, first popularly used standardized test in the public 
schools. 

"Whatever exists at all exists in some amount" was a classic, 
- Thorndike phi ase and one which says a great deal about Thorndike's 
basic approach to testing In "Nature, Purposes and General 
Methods of Measurement of Educational Products," (The 
Measurement of Edacntidni)! Products, Seventeenth-vYearbook, Part 
II, National Society for the Study of Education, 1918) Thorndike 
commented about some of the problems that; were beginning to 
surface as early as 1917-18: "[Those] directly in charg.e of educ=ational 
affairs have been so appreciative of educational gieasurement and so \ 
sincere in their desire to have tests a n^J scales devised [that quality is / 
^^acrificed). . . . Opposition, neglect, and misunderstanding will be 
.much less disastrous to the work of quantitative science in education 
than a vast output of mediocre tests for ipeasuring this, that, and the 
other school product, of which a large percent are fundamentally 
unsound. . . . . 

Thorndike was a serious student of measurement.- Eugenics was 
not hjs interest. He cor^tinued to raise concerns throughout the 



.1920s, a veritable boom period for the development and marketing 
of lesis, about the poor quality of t;L^ts, the uncritical acceptance of 
test scores, and what he conceived to be the unjustified judgments 
being made about individuals. His goal was the production of better 
tests and. mere knowledgeable' test use. While it is possible to 
disagree with Thorndrke's basic assumptions about education and 
learning, one must, as I do, respect the way he carried out his 
commitments. 

The technical quality of tests improved in the decades following 
the Depression. And these improvements are noted often in the 
educational literature. The eugenics advocates who were prominent 
in ih^ formative years were gone, to be replaced by a growing corps 
of psychometric technicians. Norming and validation procedures 
became increasingly more sophisticated. And testing became a part 
of the conventional wisdom of schools. Debates were few; criticism 
almost nonexistent. (Intact, the literature jndudes very ..little 
criticism until the 1960s.) It should be noted that the standardized 
tests, while accepted and used as a basis for many positive and 
negative decisions about students, were not viewed as overbearing. 
■ They did not don:iinate curriculum or teaching. The amount of 
testing was not great in most districts, and had what appeared to be a 
benign qualityfh many others. 

Standardized testing received a boost with the ascent of Sputnik I, 
when questions about the quality of schools began to accelerate, in 
■1965, with the passage of the .Elementary and Secondary Education 
Act, testing, and the industry supporting it, began to expand rapidly. 
With the heavy influx of federal dollars came increasing demands for 
evaluation. And evaluation in most instances became synonymous 
unfortunately, with outcome data produced by standardized tests, In 
part this occurred-becaus.e. standardized tests and the technology 
supporting them were available and evaluation paradigms which 
might ihave been more appropriate were not well developed, or 
lacked the narrow "scientific construct" that was increasingly 
demanded by a "single-score" mentality. 

We are now in a period where standardized tests area major issue 
in schools. (It should be noted again that intelligence tests are not as. 
mkJch an issue now as they once were./ Their use is diminishing 
rapidly.) The level of criticism is related closely to the volume of 
testing that occurs. Producers of tests have been surprised at the 
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harshness of the criticisms, feeling that the technical quality of their 
products is higher nbw than at any time in the past. They argue that 
the problerns-~and ihey now acknowletlge that there are prob- 
ienis— -are related io'use, or more specifically, misus?.* 

While these are certainly considerations, the issues are, from my 
point of view, deeperihan use and misuse . VVe are seeing teachers, 
and in many settings [xuents. going back to turulamenial questions 
about the purpose of the schools, the ways in vvhich children's devel- 
opment is being supported. This is. in largemeasure, what part of this 
fastback is about, and it underlies much of what 1 ani trying to 
communicoie. 



prov.d.nK test us(m. with very r,uefully prep.wod nunuals relating to their tests. These 
' manuals tentl m point cmt pamstakinv-ly the partKutar test's liniitalions a', well os 
\ tonstruuive uses, fhrv tvpu ally ( auliun. m the .ase oi m hieveinent tests. ih,it the test 
^ results ought not to hv use(i to aiuaietea. hers, th.a ^r-uie-leve! t>quivalency s<ore.s 
\ are misieadinK. that Krowth is not unici.mens.onal. that m.iny external taon.s might 
^ iffe( t i ( hildV si ore th.« the test nieasutes a saiTipling ot eurru ulum only, et(. Hut. 
' kivMW s.id all of this: .! needs also to he stated that few UMchers have ever seen the ^ 
; nianii.ils. ar^l rny i-xperience i. th.it few hools art on thr' ^.uMions that the manuals 
; provide Hn^ point will be dist ussed more fully later in tliis text. \ 
f' ' ■ 



The Tests Themselves 

H ■• 

fl 1 aving provided a basic position statement and historical review, I 
will address several issues which relate generally to the technical 
aspects of testing and which tend to create misunderstandings and 
often misuse. I make no attempt to be all-encompassing; the 
problem areas are just too large. 

In discussions of standardized testine -^ne often confronts 
terms/concepts such as ob/o<^f^'v'>v, " ' u./ '^'. ^ 
va//cy/fy. Th^ ^^Ljra ol ^ckmh o. however, 

what do i;. , in bcisii , nontechnicallanguage? 

A test is considered o6yec(/ve if everyone takes it under the same 
basic conditions. The multiple-choice format, buttressed by a single 
"right"-answer pattern, supports objectivity. But objectivity has 
nothing to do with whether a test is fair'; contains items of 
importance, or has ambiguous questions and answers. Objectivity 
has, in other words, no re/af/omh/p fo quality. 

A test is standardized if norms have been established. Whether 
the norm populations are representative in more than a statistical 
sense is not the defining characteristic. This term, as in the case for 
objectivity, has no relationship to quality. 

Reliability relates to the consistency of the test • How close are the 
results for an individuaj or group at two different testings? Or how 
close are the scores of individuals on two different forms of a 
particular test? Reliability is rather simple to establish. But a test can 
have very high reliability (most popular standardized achievement 
tests carry reliability coefficients of .87 to .93) and yet be a very poor 
test; measuring little considered important to large numbers of 
people who use or take the test. The latter point relates to.va//d/fy, 
which has an interdependent relationship with reliability. Validity 
doesn't, however, receive as much attention, ■. 
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Validity refers, the most simple level, to the degree to vvliich a 
test measures what it is supposed lo measuris and/or the degree ^t^ 
which the scores derived ivom a particular test can be related to what 
the test is suppo^>ed to be rTieasuring. In other words, what are the 
inferences that Cdn be drawn from the test scores? Validity, unlike 
reliability, is difficult to establish with any authority. It istypically de- 
termined by having an expert (or t^xperts) examine a particular test 
and provide the equivalent of an imprimatur. This is content 
validation by opinion. Content validation is the focus of most 
standardized tests used in elementary and secondary schools. {Those 
who suggest that the tests often contain bias are, in essence, ques- 
tioning content validity by claiming that the test content is not 
represenlat^ive of the sorioeducatior^ ' ' * of ^"^-inority 

people.) Validity ' f ^'^y) i'^ aK? ■ . ^'jmpai *. mp, 

;esults with uuici fneasur,. , i.c-,, vjthei le^ts, grades, or teacher 
judgm^ents. . 

So much for the terms. What can be said about test content? For 
many individuals with reservations about standardized tests, the 
content is the principal issue. This relates lo the concept of validity 
but is, at the same time, broader. Most currently used standardized 
achievement tests, as was noted in my prefatory commentVhave 
been constructed to conform to instructional programs with 
predetermined objectives and miaterials which everyone is expected 
to work through. They have 'ess relationship to programs which 
stress high levels of individualization and flexibility of objectives. 

How are the tests constructed? In preparing itenis for an achieve- 
ment' test, authors typically survey curricular patterns and basal 
materials; they attempt to learn about the sequence, if any, that 
fends to exist in various content/subject areas; and they, make 
decisions about how to establish a balance betv^een information 
items and concept items. Questions are prepared and generally tried 
out in a variety of school settings. (The particular items selected to try 
out represent, in effect, a statement about what the test authors con- 
sider important. This is not intended as a negative observation; it fs^ 
however, a condition that needs to be understood.) Those items 
which most individuals get right or wrong are discarded.; Distribu- 
tion is desired, inasmuch as the entire battery of items is designed to 
produce, among a sample of students who take the test, a normal 
curve, a construct in w hich half the students score below the average 




and half-above. In such a process,- items .which inany would suggesl 
are" imporiani mighl well be discarded and items of limited ■ 
importance retained. Teachers, school administrators, and parents\^ 
would do well to examine closely the questions which appear -in the 
standardized tests used irj their schools, or being considered for I'se, 
in order to make a judgment about the importance of the questions 
and their relationship to the local curriculum. 

Once the items are 'oped, standardization procedures be- 
gin. This . t»u a norm population and constructing 
norm sco^ ' ii )w !ong< :1 of this take? From start-up to pLjblica- 
, h . lu ^ Mitght pass. But what if curriculum changes 
are rapidland/or new goals emerge in schools? Might there thtjn be a 
gap between the curricular assumptions of the tests and the cufricula 
that actually exist? This certainly is the case in mathematics iwhere 
hath tests have been concerned for the past decade, as they were in 
t'he previous decade,, with computatio>; In a base 10 system only— 
hardly evidence of the ''new math/' What if teachers really believed 
that a shift in educational direction was necessary? Is it possible that 
such a shift might not occur because of tlie risk of lowering results on 
an achievement test designed for different purposes? And what if the 
populations taking the tests change significantly from one standard- . 
izalion to another? Do the scores derived then really have the same 
meaning? 

It is important, I belic /e,to comment briefly on test statistics — the 
derivations which bring scores relating to a norm population and 
provide a basis for giving meaning to the raw scores (the number of 
''correctV'responses on a test ^or subsection of a test). While it could 
be argued that everyone using tests should know a good deal about 
denvec|5cores and their meaning, my experience is that far too many 
do not.* 

, jTest results are most often reported as percentile scores, stanine 
scores\ or grade-level equivalency scores. Forty-three correct 
answers out of 80 items on the language section of a very popular 



•I mustndle ihjt tho test rnanu.iisdt.companyiriK most populiirly used stjnd.udiiiod tests 
are roplelcAvilh information .iboul derived scores, i.e.. how to interpret them iind vyhjt 
!imit.itions ru?0(i to be t.)ken into .Kcount wh^^'n usini; them. But ds noted earlier, the 
mdnu.ds jrc not often avjilable iti srhooK for tcufu^rs to re.Ki, tind little inforniiition 
about derived scores goe^ out to the public to inrrease'thoir understanding. 
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achieveir.ent test (1974 version) which btudenis.take at the end of .the 
seventh grade is converted to a percentiio score of 52. Th'is indicates 
that 52% of those who took the test as part of the norm population 
'.rored 43 or less and 48% scort^d higher than 43. Stanine scores, 
. M^- percentile scores and grade-equivalency scores, are 
sugv;r... eoi a range. (For thi<> reason test publishers increasingly are 
encouraging their use.) All raw and perceniile scores are grouped to 
mak. ,ip a nine point. orsfan/noscale.Astanine score of 5 is average; 
40% of the scores will then fall above this average and 40% below. On 
the test cited above, the percentile score of 52 falls within the fifth 
.stanine, along with all percentile scores between 40 and 5i3. The 
diagram below presents range raw scores converted to percentiles 
and then, to a stanine scale. ' • 

stanine ' 1 2 J 4 5 : 6 7 8 9 

lTcenm<^^ 1-2 4-10 11-22 23-39 40-58 59^76 77-88 89-94 95-99 
Raw Score i\l4-17 18-22 23-28 29-36 37-46 f47-55 56-b3 64-68 69-79 

The grd(A'-/eve/ equivalency score is , derived essentially , by as- 
signing to thi> median score of a seventhigrade norm population a 
grade-lcvele^uivalency of 7.0. Scores above and below the median 
are assigned grVde-level equivalencies above and below 7.0. It is an 
estimation, notl\ing more. The score of 45on the language section of 
the test under dikcussion (taken at the end of the seventh grade) con- 
vert^ to a percentile score of 52 and a grade-level equivalency score 
of 8 3 (eight yearsAhiee monthsl A score of 41 converts, on the other 
hand, to a grade-level equivalency of 7.8 and 45 converts to 8.7. Two 
quesf/oris r/ghfVr vWong cover a range n mon(hs. 

The publisher, of\the foregoing test lists a .standard error for the 
language section's 3,9. This standard error indicates that two-thirds 
of the tir^i-e...one coi^ld expect a fluctuation of 3.9. As the test manual 
notes "VVe could Expect with abou|^68% certainty that- the true 
score'lf6r a student with a raw score of 43] would fall between [39 
and'4'71" This is betvVleen the 44th and the 60th percentiles; the 
pr^dedevfji equivalency range is 7.2 to 8.9. And one-third of the time 
'--Vnere may be even more error. The point of all of this is that the 
.scores arp very imprecise: one h.Vs (o te very careful In a/(ach//ig 
too mi/r/i /rt7por(arice (o !/iem. / 

Of all of the derived scores, grade-level equivalency is the most 



23 



9/ 



commonly used, even though it is the most misleading Test 
publishers now regularly, in their manuals, point out that grade- 
level equivalency -scores "are being questioned as an appropriate 
means of mterpreting the test performance of individuals and 
groups." They suggest further that "grade equivalents are not an 
equal-unit score scale , . . statistical computations based on grade- 
level equ.valency values, ore not, strictly speaking/legitimate." In 
some cases they admonish users not to report grade^level equiva- 
lencies at all. Henry Dyer, possibly the most respected authority in 
lestmg, has called grade^'evel equivalency scores "absurd wrong 
and misleading." And In even stronger language he commented 
(The United Teacher^ April 14, 1971. p. 15) that they are ^'statistical 
monstrosmes . . . (that they] lure educational practitioners to suc^ 
cumb to what Alfred North Whitehead called the Tallacy of mis^ 
placed concreteness.' " But grade-level equivalencies continue in 
part because school pev. - have been lured, with many accoun'ta^ 
biltty models, to measure growth (or "misplaced concreteness") If 
children are in school for eight months, "then they should make 
eight months' gain." Only grade-level equivalency scores report in 
year-month ter^ms. 

In one schoo' district in which we recently conducted a review of 
the accountabiliiy system, the following appears in the Statement of 
School District Objeaives: "Students (are] expected to gain, on the 
average, eight months in ^ academic achievement between 
September and May/' And in a booklet related to the particular stan- 
dardized testing program used in the school district, the following is 
presented: ''A student is expected to grow academically one month 
for each month that he/she is in school. Since there are eight months 
between the administration of the pretest and the posttest, it is 
desirable for the student to'gain eight months during the time." ' 
I won-t describe what teachers engage in In order to aphleve eight 
months' gain, but to enter into th'is.kind of gain score mentality one 
has to again misrepresent the tests and abuse the scoring system 
Most test publishers make reasonably clear~-though they are not as 
direct as they could be— that gain scores are "fraught with 
problems/inasmuch as the tests were originally constructed only to ' 
identify present status. (And even the problems of interpreting 
present status are immense, according to the test manuals.) In order 
to address the question of eight months' growth, one has to assume 
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that fzrowth is unidimensionni and linear and that eightuTionths of 
schooling corresponds empirically with eight months' gair. on a stan- 
dardized test. Neither is the case! 

What does it mean to learn thai a third-grade c.h>ld (or class) is 
reading at a 7.9 grade level? Does it suggest, as ! have often heard or ^ 
read in the newspapers, that this particular child (or class) is reading 
as', well as average-achieving youngsters completing the seventh 
-Krade? To begin v.ith, what the tests measure as reading abiiityjn 
grade three is not necessarily the same thing as reading ability 
measured in the seventh grade. A score of 7.9 is nothing more than 
an extrapolation above the mean. It has in no real sense anything to 
do with how vvell a youngster completing the seventh grade read.. 
Conversely what interpretation is to be made for a seventh-grade 
child with a grade-level equivalency score of 4.0on a reading test de- 
■ signed for seventh-graders? It implies clearly that the youngster^s 
reading score is much lower than average-achieving seventh- 
Eraders who were part of the norm population. But does it establish 
That the Youngster reads only as well as a fourth-grader? The chances 
are that'if this seventh-grader took the test designed for fourth- 
graders his grade-level equivalency score might be 7.0. Remernber, 
the f^sts were normed at particular grade levels. Third-graders didn t 
take a test designed for seventh-graders and seventh-graders didn t 
take a test- designed for fourth-graders. We are contending at best ■ 
with a statistical construct. Yet the use of grade-level equivalency , 
scores goes on unabated. , . ' 

Much of this section has dealt with test scores in a broad sense. 
\ The problems, however, grow as the context narrows and the tests 
are used tosay something about an ind/wc/ua(student'sach,evement 

of narticular skills or to determine specific instructional needs. (Even 
the test producers are promoting some of this direction.) But given 
the content sampling ^hat is involved, the manner in which the tests 
have been constructed; the paper and pencil multiple-choice for- 
mat, and the sources which exist for error, this isn't an eminently 
valid use In order to make decisions about individuals, individual 
student scores on specific test items or subparts of a test must be 
used Even from a technical perspective this is a significant problem,; 
Let us examine some of the sources of error. The health of a child on 
the day the test is given can affect the score.-Noise in the classroom, 
teacher attitudes toward the test, whether a child has taken similar 
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U-sts. . broken pencil, .nd .,nv nun.bor of similar dis.urbancos c,,n 
ntluc-ru e a par„< ular srore. The n,ental s,ateo,a chnd-dopression 
al,ou, .h... u., -. an also make a d , e o 

SuT,p!. m,.< han,cal ..uors ^uch as ,na, k,n« rhe wrong box on rhe 
' V "-■'^l<">kinK a ciu.snon, or .nissin, a word 

'*Md.nK ar.. .Haliwly common u-si-iakinj. problems Children 
;'MV-.e.,< ,n, d,,,K.,,,v ..,h ,ead,n, w,ll pertoln, poor v o e 
.<,.rn.dw.!.read.r,:bu,,heyw,nalso u.Klu.^ 
Hu,l > ud„.s sr.encc., and „,a,h sections of achievennm, Ls 
'^^^-^ reading skills. Thus a child's real know b 

erige nuiy be considerably unde, est,ma!e(i 

■^■II'V ol (hesourc es ol dif.ic ully oudined ab.>ve -and the surface 
» ..^.0/;u:.K been sera,, hed^-can affect an individual's score And 
; " ' — - " /'"'K uit> have li„le ,o do with how -gocid" the e 
''liv „ w.. prepare,!, or the validitv c>r its coruen; 
LitiK .1,.. s are. in general, intonsic to the nature of standardized 

',!;;;;.::; 7" ''«^-bed briefly above? I, 

o, ; : . how the res. results are used, in reports 

est scores , or large groups of children, it is possible to expect 
manyotthemechanicalerrorsandrelateddifn^ 
'^^^n balance ,.>,„; sonre children will score above their 'tr e" 

ir whT "i"'""- ' ''"^^''^ '^'"'S^'-'f' 'he more likely 

' <hat such . balancing will occur, fiu, for a single individual no 
on, score ex.ts N„,hmg can con.pensate for err6r when a sin e 

■ M e It'';; " ''''' ^"''^'"'^^^ - 

Inasmuch as reading tests are common and are used for 
pia«.men, „, ,0. establishf^g skill assignments, 1 will offer .0," 
general comment on them. The criticisms that I, offer, however < e 

■ nnd -f to the reading section of a general 
andar<l,.ed acbievement.est as well as to other subject areas A 
he lovver grade levels, reading tests are heavily deper^den upon a 
p.^cuU .,„6u/a.y. If a youngster's vocabulary Ls not incM ? 
mar^V of the w ords in the^ test. are we really to assume the problem is ' 
readm,? In addition, many of the questions which appear deZd 
on information rhat is not provided. A child can read the items a d ^ 
oftheresponsesandthenselectthe.''wrong"answer.Theproblemi , 



a lack of information and not a l::ck of reading skill, it is also possible, 
of course, given the unibiguities In many items, for a child to select 
ihe "wrong" answer but read very well. -Another issue is the obvious 
cultural bias that appears in reatiing tests, espe(:ial!y at the primary 
level. \M tbis level, the tfists make heavy use of pictures which often 
reflect particufaY- experiences that c*re not ntnessarily part of ihe 
experience of large numbers ot children In the United States. 

The foregoing, and similar critiques that coultl be offered, are 
possibly small issues. A more serious question, at least for those 
interested in reading as an area of inquiry, is related to the assump- 
tions which underlie 'most standardized reading tests. In general, 
reading testsassumea hierarchy of specificskills. Exercises relating to 
words in is''olation, decoding, syllabication, and the like are common. 
But there is no agreement among reading experts that any hierarchy 
of skills exists.* Many of the skill sheets that children are seen 
struggling with are rejated directly to a hierarchy of skills. In fact, 
several of the tests have correlative materials which can be assigned 
to students who score poorly on a particular skill area. There is some 
evidence that such activities will increase'scores on readings ests, t^ut 
there is little evidencethat such activitiesenlargea child's capacity to 
gain understanding from the' printed page {the way in which many 
individuals define reading). The lime taken doing skill sheets on 
syllabication, for example, might have been beiKir spent reading, 
enlarging one's experiences with words in new contexts, ^tc. 

Having raised in an indirect manner ifie issue of time, I wish to 
pursue It further. Testing lakes time. Does it add significantly to a 
child's learning? Or does it take time away from other, more 
significant, learning experiences? 

In many schools actual standardized testing time for most 
childrerv takes four days in the fall and four days in the spring. But 
how much time. goes into preparing children directly for the tests 
themselves? Arid if a child is "targeted" under Title I, he is likely to be 
in for another dose of. pretests and posttests in readir-:g and, math. If 



•A group"t)f ri'.uiin^ ovprns (.mic t^ii^i'thf'r under the tiu'ipici's of fhf Itifcrnjlianal 
Reading -NsstK ijtton in 197j tcj (list uss riMiiini^ .ind M\icling lusts, [hey .jmn-d .ilmost 
uji.inimously thdt the existing nonivfeftneru fMl rcMcling test<- were v-vithovii .nhcdreticdl 
base. Th(^v agreeii further thdt theff^ is "no detinitive knowledge reg.irrling eilheuthc 
secjUfntidl lear rungs or c on^^x/nent ^ki!^ thdt thildr^-n nuj'.t dccjuire m orcirr to redj 
succ esstijily." > 
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the "targeted" child is also in a Follow Through program, he tnay 
well receive another battery ot tests related to the. National 
Evaluation ' roject as vv(HI as cjtlier l)attories mandated by the Follou' 
Through sponsor. The possibilities proliferate the more one thinks 
about testing in schools. 

What is learned through all of tfie testing^ The question that must 
always be asked in addition to all that has been said is: Do fhe fesfs 
provide more ifUormatioi) a/juuf a child's .dchiwcnieiU in most 
subject areas than the child's teachers iypicaify possess^ In gener.il, 
no! Teachers can Jn most cases, provide more precise information to 
a parent about the quality of a child's reading or math skills than 
any standardized test score can. Do the test scores inform teachers 
about what they should do? There is nothing inherent in the tests or 
the scoring mechanisms that provides a capacity for informing 
teachers of what they should do. 
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A Moratorium? 



hus far I have provided a position statement, son^ie historical back- 
ground, and a brief introduction to test characteristics about vvhich it 
is important to know something. I now wish to address acjuestion 
that looms large on the horizon of the standardized testing 
controversy: namely, ought there to be a moratoriums^ 

To raise the question of a moratorium among teachers, school 
administrators, parents, school board members, and legislators 
appears, to elicit fear, even when there is a negative orientation 
toward standardized testing. A moratorium seems for many to bean 
ultimate step that might threw education into a chaotic state. Such is 
the authority that standardized testing has come, over the years, to 
wield; Nonetheless, the gauntlet has been thrown. The National 
Education Association passed a resolution calling for a moratorium 
on the use of standardized intelligence and achievement tests in 
1972. In the past year, after several years of relative jndifference to 
the resolution, the NEA has become aggressive in its support of a 
mojatorium. The Nation?-! Association for the Advancement of 
Colored People issued ? moratorium statement in May, 1975. Trie 
Association for Childhood Education International gave support to a 
moratorium in 1976, and the Association for Supervision and 
Curriculum Development, American Association of School Ad- 
ministrators, National Association of Elementary School Principals,- 
and the National Council of Teachers of English, while not calling 
directly for a moratorium, from 1974 to 1976 used particularly 
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vigorous , language in" agitating for. a rcu onsidtnation ot ail uses of 

standarcJi/ed intelligence anci a( hu7\enuMit tests.* 

There are many inciivVduals and groups who share the 
perspectives ot the organizations listed above i>ut feel tfvat a mora- 
torium. If there is to be one, should be aimed exclusively at ^mup- 
Miministt^red intelligence and achievomeni^resting. They hold to a 
belief that tests yielding normative scores can be used "if the tests are 
administered on an individual basis by a skilled examiner who makes 
sure that the child understands what he is supposed to do and wants 
to do it." says Millie Almy in The far/y ChiUihood Educator a( Work. 
Such a position seeks too much! A moratorium, I believe, has the 
potential for encouraging .the development of— and legitimization 
f(.' -alternatives to existing standardized testing practices. This is a 
crucial direction, inasmuch as evaluation, as noted earlier, is clearly 
essential to the qualitative improvement of educational practice in 
schools and the learning of children and young people. A 
moratorium also holds promise f or . i nt ensify i ng critical 
\ reexamination of the politics of testing, the problems of misuse,** 
'^>nd the negative effects of standardized testing practice on children, 
teachers, irui programs. The more cjeeply the foregoing are 
u?iderstood, the higher will-be the jpqtentiai for future reform 
efft>rts, 

Would a moratorium on standardized testing disrupt school 
practice and bring an end to all evaluation? There is no reason to 
believe that either would occur, Mant school districts dp not use any 
standardized testing program; yet tlieir evaluation practices are 
intense. Will standards decline? Thereiis no evidenceihat standards 

-'^MrshouUrbc noted thj5\;roup«administfrecj i;iicllii^i'n<:e ^est^ have been banned in 
OliforniJ .jnd Ne^ Vcnk .md that lemsI.itHn? pruh.bitinK all intelh^'pnce loslinH. is 
pending m Massachusetts. Le^islation of ihis.'sort may well proliferate, inasmuch as 
intelliKence tes^in^ is even beuiK questionec/ by testing proponents. Henry Dyer has 
suKKestod that the scores dt;rivedlrom intelligence testsare "dubious' and based upon 
an impossible ass^-mptton about the equivalence of hurnan experience and the 
opportunity to learn." William lurnbull. president of the Educational Tesiinn -K^rvice, 
commented at a svmpcr„ufn on testioK i \rl>nKlon. VirK.nfa. May 7. 19761 that the sooner 
we eniiihe use of all so-called intelligence 'ests, the better. 

'•Tost publishers, as has been noted, acknowledge miu n of the n^isuse that occurs; a 
moraiorium 'miKht prov,do time for test publishers to reestablish the authority of thesr 
effofls enablif>K them to develop prr)cediires that jss(jre pro[)er use of rheir tests ancj 
also to enter inm<"'ljl>^^ration with thcjse who are developing atiernalives to norm- 
fcferencefl evaluaiRin proceiiures, ■ 
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•have any relationship-to 'ihe u.r or lat of- use of standardized 
measures. One could even argu- , suspc; . that school standards in 
the United Siato:-» have declined . ^>tandar^ .ized tests have increased 
in use. 

Would a moratorium on standardized tests cause schools to fa!! 
back on "unsystematic evaluation processes." fostering an increase 
in "discrimination and ignorance?" Such an argument rs popular but 
has little, if any, empirical data to support it. A moratorium would 
provide an excellent opportunity to assess such a belief. 

To call for a moratorium is. for the most part, an appeal to moral 
authority. It can't really be more^than that. Few wish to see federal or 
■ state legislation or court orders as the base for the reexamination that 
cries out for attention. 
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Alternatives to Standardized Tests 

This brings me to the concluding section of this fastback and a dis- 
cussion of alternatives to standardized, tests. 

How might teachers and schools proceed with an evaluation 
program that does not include standardized tests? Some alternative 
directions follow. Thev are clearly not all-inclusive; and many are. in 
fact, merely -.-eaffirmavions of practices that many teachers engaged 
in before they experienced the disruptive pressures of increasmg 
timbers of standardized tests. 

Supporters of standardized tests often argue thai the tests are 
"objective" measures that serve as a check on the ■'subjective," 
inadeqCiate asses/m&nts made by teachers. (Yet, mteresiingly 
^-enough one source for validity checks of many, standaraized 
measuVes has been teacher judgment.) I do not accept the 
assumption that teachers have inadequate record-keeping and 
assessment skills. When standardized test supporters acknowledge 
that teachers do possess some of these skills, they arguethat the tests 
are necessary because teachers and schools are not often organized 
sufficiently to describe children's learning or school programs. It is 
true that to engage in a systematic process of documentation is to 
expend considerable effort. Fortunately, increasing numbers of 
teachers at all levels wish to make such an effort. 

Systematizing Documenta5ion 

What might a group of teachers in a given school v.-a,it to look at? 
What might they view as especially important todocument? Answers 
need (o come from the local school. (Individuals external to the 
school havH typically determined what it is important to document. 
And such a process has contributed significantly to the negative 
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charac^-: v\ hkh evaluation has tended to assume,) 1 do not mean to 
imply n>r a school as a whole (or particular clusters within a 
^Lhooli iO -nake such decisiofis it is necessary to have standard 
record-ke(!pmg procedures in all areas in every classroom; I do 
mean that there must be some consensus about what areas will be 
looked at closely by a group of teachers. Where such a consensus 
exists, individual teachers not oni> receive the support ol' their 
colleagues, but they also have others with whom to share their 
documentation and reflection. Moreover, such a condition provides 
a climate in which teachers can feel comfortable while observing 
each other's classrooms, interviev^ing each other's students, and 
seeking and providing assistance. 

Process, Content, Context 

In documenting the process of /earn/ng, teachers in a school 
might wish to include information about the children's originality, 
responsibility, initiative, and independence of effort. In relation to 
the content of learning, they might wish to consider materials the 
children produce (such as writings and drawings); evidence that 
Instruction deals with impprtani concepts as well as necessary skills; 
and evidence that children find meaning in their learning, that it is 
not merely rote. And in relation to the context of learning, they 
might consider the basic human relationships that exist — child to 
child, child to teacher, and teacher to teacher— and see how much 
respect there is for the efforts and feelings of others. 

The Prospect School in North Bennington, Vermont, uses some 
of the following records for its basic documentation: children's work 
(for example, drawings and photos); children's journals and 
notebooks of written work; teachers' periodic assessments of 
children's work in math, reading, and other activities; curriculum 
trees: and sociograms. The documentation is so complete that few 
individuals ask about a standardised test score. A synthesis of these 
records with precise statements regarding work in math, literature, 
reading, etc., is prepared at the conclusion of each year. This pro- 
vides the subsequent yepr's teacher with rather full information 
about where to begin with a child. It should be noted, however,that 
within the.school communication among teachers is sufficiently high 
to enable teachers to gO"beyond the year-end statements to the 
fuller documentation that is available in relation to each child. The 
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vear-end statements serve as the record for a voungsti-r vvho trans- 
fers to another srhoo!. These are tar n.ore preose operationally than 
test scores and are twHcally viewed as more heipiul. 



Interviews , ^ , , , ; ,n 

■X- ,he University u. Not in Dakota a process .or d<,< umenta.u n 
has been developed ,ha. includes in«ervie;.s cotv-ucted w.l.i 
.earners, children, and pa.en.s.- The^e ,nterv,,:.vvs ha . b-'" ^ "1 
extensively as a oase to: progran. evaluation and sta. -'-^^^^^^^ 
-he teacher Interview nrovides a coniexi for ,nd,v,dual teachers to 
re(i.-.ct on the.r in.enions, use of materials relationships wah 
children. or«anuatlon of time and space, ditf.culties. successes and 
,0 .'.n-in othc. wo:ds. the teacher's own perspective of the clas.- 
roon. The chiUI mf. —A provides another important perspective, 
focusing on such ..oes a^ how the child use. n.ateriaU. pursues 
learning, understaivi. what is occurring in the Classroon., use^ the 
teacher and relates -o other children. The parenr uuervieu , bnng,r.g 
in a third perspectl^ e, i, aimed at a description and understanding o 
parents' perception^ and attitudes about what is occurring m he 
classroom, the degree and kinds of their invoK^ment n the 
classroom, what they believe is important, how they view the. 
children's progress, and their overall level of support (or lac o 
support). The "three interviews provide an enormous 
qua^atWe evaluation information about classrooms and schook^^^^^ 

standardized test can provide as nuich data "^'^'^^ ^^ l^'^ 
difference in what teachers do and how children learn. This is es- 
pecially true when the information gathered in the interviews is 
seriously considered and discussed. 

Broadening the Base of Operations 

.. Teachers can keep up on children s progress :n ^-^^^ aj^ as 
reading, language development, and math througn sy^terrnatic ob- 
: ; tion and freo..ent conferences (recorded). Can a Mandardized 
■ achievement tesr .... reveal as much as carefully kept records 
maintained over a period of time? 



•This effori has bi'r' ^uppt to a Id.s^- atj^r 
rtjsodrrh gram (No -/-leq -0979). 
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Many teachers make use of infcrma? reading inventories as a 
means of monitoring rea^iing, especic^iiy when iheyvvish some rough 
comparative informatior, Brenda Erv^el recently devised a number 
of reading tasks (similar to those used in informal reading inven- 
tories) to sample the reacing kvel in the Cambridge (Massachusetts) 
Alternative School. Defining reading as ""the ability to get meaning 
from the printed page," she categorized children as * those who can 
read/' ''those who are still in the process of acquiring reading." and 
''those who are nonreaders/' Children are asked indiv idually to read 
a story (approximately 1G0 words in length} silently. The interviewer 
says: "Tell me what thesrorv was about/* After recounting the story, 
the child is then asked tc read the storv ' j the interviewer and, after 
reading, to add to what he had related previously. In very genera/ 
ferm5. a '^reader" is one who '^can read the text silently and relate the 
story adequately and/or can read the text aloud with fluency." A 
'^nonreader" is one wl i^^' indicates h : cannot read the text silently or 
is unable, after readir;:- the text sUentiv. to convey the principal 
meaning of the story and js unable to read aloud more than a few 
sight words." A range of reading needs can be identified in the 
process of this reading exercise. 

Math checklists which te:: -ers find useful are often provided, 
along with the various m^::: programs used in schools,* And 
individual teachers or groupr> of teachers cjn prepare their own 
checklists; they can also devise inT,>rrnai inventories of math 
understandings.** 

• ^r7/^aT\7jT/H?i^ ' if >lis \\inston ; 19741 is a projir^im ihdl pruvidi,. 

narnc-ularlv offec^tive che( >. t> lor i-jcher'^ ai^^l rViiidren. The Nuffield Mjthtjm.jtK s 
Proiect p/ovides "cherk-u; «ui(ie> to dtiterninu? children's growth in j vjricK of 
concepts. See also Njnc> (tuichhin in jn Opt^n C/;issroon): Intomial Chts ks, 

Di iKnosos and iejrn/n/,' Sir^togte'^ tor Hoi^innini^ Re^nlinn ^nd \fa(/i (BoMon: Natic^nal 
A^'^oclJlicn of Independent Schools, 1975), tor some excellent ways of using informal 
checks productively. Lantjsiaff's case studies provide a realistic context and ',houl:i be 
useful to teachers and principals. We need many more such descriptions, written by 
classroom teachers or careful classroom observers, in order to eniar^e teachers' under« 
standings of such recorci-keeping and evaluation processes. 

•'Readers may wonder why there has be-'n no mention yet of criterion-refere-ced 
tesung. Manv feel that criterion-refcre-r-. ^stin^has ("r.KmouspotenTuIandrmiv Dea 
useful' replacement for existing n. - -v-fPf -need hievomen; lesimg. In many 
respects criterion -referenced testin.; pro«t.i:ns w^* improvement. They n.^ , 
potential, unlike the norm^referenc ed testing th has dominat^H) the school-, for 
providin^ ^ome useful information abous chiUiren's perform.inr e h^ relation H- 'ht- 
direcl ir,sifUCtional purposes of teachers or of the p,ir:icu!ar nu^rh. reading, or ^ u'mI 
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Teachers' Roles 

Ail of ihe foregoir^g kindi L?i records cafi be pariicuiarly helpful lo 
a teacher in plarviing U.Mrninu at liviiio^ that relate to a specific chiiu 
or group ot chddren. Teach* ^^^s wilh wfiofn we work feel this h a 
critical aspect of their vv(;rk. 

For teachers to make a conscious etfort to docunient in sofvie ot 
jhese way^, thcv rnust ^tep t)c( k arid observe trom tune fo lime- io 
make such obM-.^rvations meaj^mgful it is necessary to have a wide 
range of learning activirje,; avaii<^ble ir,t ( hilar en to engage in durtng 
th:> obstrfva^ujn, Other wibe the ac tiv ihes are so undifferentiatetl that 
the observ^^'torr. vili proviiie !irriit<rvJ insight into children and their 
learning patterns, interest s, and needs. Being free of standardized 
tests mit^hl enc-:'" sur^'' cia^sroom environn^ents. 

Cidfdren's Roles 

Mow can childrt^ji thrs^ ^'Ke^ '. c.ntrun.ite lo aitrrnatives tn the 
testing? Wrten chil'iron [),.e iu.ipate in recoro' keeping — m.'iintaii'ang 
e:a i ly or vv ee k 1 \ j o u r n a ! s , i i i : n i> sa n-J f )1 es o 1 1 h e i r w r i t i ng , r eco r d i n u t h e 
h<>.>oks tht-v haw rr\e..! or \ matfi cf^riropts they unfierstarul-^rhey 
r;.>t ofdv !ntorrT"iaV'-.;n lo tiie teacher buj they have an 

ici creased st-nsu' ot whtM'^^ thev are and what the\' need to do tci 
extend rhefr tecirning. 1 t/arning take-; on a personal character, 
encoviraeing siuds'nt- U) .rv iirTie greater r(^Sf.>unsibility for theirown 
learn:fig. (Can anv ot thf -ia.f>(ijrdi/;e(i tests do as much^^ 



iyiU:'tu>f\-u:ii'(>:nc<:'<\ ir-',^- hfi-n jvuii.tik < o;- ^'^ii:!*'!^ jr^urni iU.^n?-, thai lend 

fhrn)'st.-Kf>s r-i' • ■■ .ff'nrnts ■ (iiMMtl*. ;n{< rf.'frJ.ihh- \n U'rms of ''liVvTtC'CM.i 

-. M" Xf; - : , ; C i.>ur ■^•^ - in f HuCJfJ^." ! ') " 1 , p . . I-i'V?i' u,l I..) n uMStif 0 sf^Jp/M (cf sk*; at 
■ ho '^xp*-': .^v . hi^jrM^f-li".^'l ilu>i;w;h,r proi cs-ic^, ip hoft s?.ike jruj Ot-nrits Ot.i'^ler. 

M'>; <„ lit! ban i-' ajiishinu Cn 197 ;«> riMi'/nrc *' n-^uhn ut skilk in isoLuion. Jhey 

p.'nMiif "io inof!' >::jjf,ifv«'f Tr;.i rh<^ f^,«j! ri'.-f (;«!,■■ -tk "d tt.'bts ' Th.u the lH'h.V»;orv 
t.' »-■; ■-'r; !f , ,^rt' j 'ViSK :"!iif !f I.;;';"! Of tf-!..: t iH' rurr ;wjiti!n il ni ht* (i».'v<'!:;{)rd priruipijilv (o 

■i.k;;--.; rna'v ! i:r.;:" -rr^rn p '''i ' ■■■^d W'jrftMi Stri\ndbt'f^. "A 

-.p*-' • ./..'K^r C'f»//-'i:/^ K.- .,'d, '-rbrtjarv, )972i. Whjfo I <lo 

afnu ;:r<.tij;ra'" " ^''<<i '■?:,;(! s m tht^ (iu fM.ii-., ; ar«"adi in dv.-ir iorrnjliwt* >TJg»?v. 
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Parents' Roles 

In acjtii^ion to children, parents can be? actively engaged in ihe 
documentation process. For example, parents can conduct 
observations oa the use of space and materials in 3 classroom, the 
task persistence of individual children, and various social relation- 
ships. They can al^o take photographs at various times during the 
vear to record cia^sroon^t changes, three-dimensional projects, and 
so on, and ihey can summarize reading biographies, questionnaires, 
and other materials. In the process there is poieniial for parents to 
gain 5ncft?^^ed knowledge about schooling and to enlarge their 
overall contribution to.duMr children's education. And. of course. 
:he infofiTiaiion has enormous potential for the classroom teacher. 

In Sum 

The foregoing; suggestions, as mentioned earlier, are hardly 
meant lo be alhindusive; they oughr to indicate that the means for 
evaluation are accessible if teachers organize their resources for 
such a purpose. Teachers need only decide what kinds of records 
they want to maintain^-^recogni/Jng, of course, thai they can't do 
everything in any one year. 

The outcome of engaging in alternative processes such as those 
suggested is the establishment of a basis on which individual 
iMcherS and schools can improve the quality of their efforts. This, 
alter alb i^ what evaluation must do to have any meaning, and it is 
what many of us wish to foster. 
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A Closing Note 

Who t is very clear as I bring this fasib.ick to a close is that so much 
needn't have bc^n left out. I can anticipate? questions but won't be 
close enough to the readers to respond lo them. Tor many of the 
readers^ the content will appear radical: to others it wil! appear 
const^rvativo. 1 h,ne. however, attempted to produce a moderate 
Statement, one that will encourage discussion and promote an 
ex^aTnation of tests, testing prjciices, and test uses. If the fastback 
5erve^ ^uch purposes, my objecrives will have been met. 
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